{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "# Ignore numpy dtype warnings. These warnings are caused by an interaction\n",
    "# between numpy and Cython and can be safely ignored.\n",
    "# Reference: https://stackoverflow.com/a/40846742\n",
    "warnings.filterwarnings(\"ignore\", message=\"numpy.dtype size changed\")\n",
    "warnings.filterwarnings(\"ignore\", message=\"numpy.ufunc size changed\")\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "%matplotlib inline\n",
    "import ipywidgets as widgets\n",
    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
    "\n",
    "sns.set()\n",
    "sns.set_context('talk')\n",
    "np.set_printoptions(threshold=20, precision=2, suppress=True)\n",
    "pd.options.display.max_rows = 7\n",
    "pd.options.display.max_columns = 8\n",
    "pd.set_option('precision', 2)\n",
    "# This option stops scientific notation for pandas\n",
    "# pd.set_option('display.float_format', '{:.2f}'.format)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Granularity\n",
    "\n",
    "The granularity of your data is what each record in your data represents. For example, in the Calls dataset each record represents a single case of a police call."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CASENO</th>\n",
       "      <th>OFFENSE</th>\n",
       "      <th>CVLEGEND</th>\n",
       "      <th>BLKADDR</th>\n",
       "      <th>EVENTDTTM</th>\n",
       "      <th>Latitude</th>\n",
       "      <th>Longitude</th>\n",
       "      <th>Day</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17091420</td>\n",
       "      <td>BURGLARY AUTO</td>\n",
       "      <td>BURGLARY - VEHICLE</td>\n",
       "      <td>2500 LE CONTE AVE</td>\n",
       "      <td>2017-07-23 06:00:00</td>\n",
       "      <td>37.88</td>\n",
       "      <td>-122.26</td>\n",
       "      <td>Sunday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>17038302</td>\n",
       "      <td>BURGLARY AUTO</td>\n",
       "      <td>BURGLARY - VEHICLE</td>\n",
       "      <td>BOWDITCH STREET &amp; CHANNING WAY</td>\n",
       "      <td>2017-07-02 22:00:00</td>\n",
       "      <td>37.87</td>\n",
       "      <td>-122.26</td>\n",
       "      <td>Sunday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>17049346</td>\n",
       "      <td>THEFT MISD. (UNDER $950)</td>\n",
       "      <td>LARCENY</td>\n",
       "      <td>2900 CHANNING WAY</td>\n",
       "      <td>2017-08-20 23:20:00</td>\n",
       "      <td>37.87</td>\n",
       "      <td>-122.25</td>\n",
       "      <td>Sunday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>17091319</td>\n",
       "      <td>THEFT MISD. (UNDER $950)</td>\n",
       "      <td>LARCENY</td>\n",
       "      <td>2100 RUSSELL ST</td>\n",
       "      <td>2017-07-09 04:15:00</td>\n",
       "      <td>37.86</td>\n",
       "      <td>-122.27</td>\n",
       "      <td>Sunday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>17044238</td>\n",
       "      <td>DISTURBANCE</td>\n",
       "      <td>DISORDERLY CONDUCT</td>\n",
       "      <td>TELEGRAPH AVENUE &amp; DURANT AVE</td>\n",
       "      <td>2017-07-30 01:16:00</td>\n",
       "      <td>37.87</td>\n",
       "      <td>-122.26</td>\n",
       "      <td>Sunday</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     CASENO                   OFFENSE            CVLEGEND  \\\n",
       "0  17091420             BURGLARY AUTO  BURGLARY - VEHICLE   \n",
       "1  17038302             BURGLARY AUTO  BURGLARY - VEHICLE   \n",
       "2  17049346  THEFT MISD. (UNDER $950)             LARCENY   \n",
       "3  17091319  THEFT MISD. (UNDER $950)             LARCENY   \n",
       "4  17044238               DISTURBANCE  DISORDERLY CONDUCT   \n",
       "\n",
       "                          BLKADDR            EVENTDTTM  Latitude  Longitude  \\\n",
       "0               2500 LE CONTE AVE  2017-07-23 06:00:00     37.88    -122.26   \n",
       "1  BOWDITCH STREET & CHANNING WAY  2017-07-02 22:00:00     37.87    -122.26   \n",
       "2               2900 CHANNING WAY  2017-08-20 23:20:00     37.87    -122.25   \n",
       "3                 2100 RUSSELL ST  2017-07-09 04:15:00     37.86    -122.27   \n",
       "4   TELEGRAPH AVENUE & DURANT AVE  2017-07-30 01:16:00     37.87    -122.26   \n",
       "\n",
       "      Day  \n",
       "0  Sunday  \n",
       "1  Sunday  \n",
       "2  Sunday  \n",
       "3  Sunday  \n",
       "4  Sunday  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "calls = pd.read_csv('data/calls.csv')\n",
    "calls.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the Stops dataset, each record represents a single incident of a police stop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Incident Number</th>\n",
       "      <th>Call Date/Time</th>\n",
       "      <th>Location</th>\n",
       "      <th>Incident Type</th>\n",
       "      <th>Dispositions</th>\n",
       "      <th>Location - Latitude</th>\n",
       "      <th>Location - Longitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-00004825</td>\n",
       "      <td>2015-01-26 00:10:00</td>\n",
       "      <td>SAN PABLO AVE / MARIN AVE</td>\n",
       "      <td>T</td>\n",
       "      <td>M</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-00004829</td>\n",
       "      <td>2015-01-26 00:50:00</td>\n",
       "      <td>SAN PABLO AVE / CHANNING WAY</td>\n",
       "      <td>T</td>\n",
       "      <td>M</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-00004831</td>\n",
       "      <td>2015-01-26 01:03:00</td>\n",
       "      <td>UNIVERSITY AVE / NINTH ST</td>\n",
       "      <td>T</td>\n",
       "      <td>M</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-00004848</td>\n",
       "      <td>2015-01-26 07:16:00</td>\n",
       "      <td>2000 BLOCK BERKELEY WAY</td>\n",
       "      <td>1194</td>\n",
       "      <td>BM4ICN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-00004849</td>\n",
       "      <td>2015-01-26 07:43:00</td>\n",
       "      <td>1700 BLOCK SAN PABLO AVE</td>\n",
       "      <td>1194</td>\n",
       "      <td>BM4ICN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Incident Number      Call Date/Time                      Location  \\\n",
       "0   2015-00004825 2015-01-26 00:10:00     SAN PABLO AVE / MARIN AVE   \n",
       "1   2015-00004829 2015-01-26 00:50:00  SAN PABLO AVE / CHANNING WAY   \n",
       "2   2015-00004831 2015-01-26 01:03:00     UNIVERSITY AVE / NINTH ST   \n",
       "3   2015-00004848 2015-01-26 07:16:00       2000 BLOCK BERKELEY WAY   \n",
       "4   2015-00004849 2015-01-26 07:43:00      1700 BLOCK SAN PABLO AVE   \n",
       "\n",
       "  Incident Type Dispositions  Location - Latitude  Location - Longitude  \n",
       "0             T            M                  NaN                   NaN  \n",
       "1             T            M                  NaN                   NaN  \n",
       "2             T            M                  NaN                   NaN  \n",
       "3          1194       BM4ICN                  NaN                   NaN  \n",
       "4          1194       BM4ICN                  NaN                   NaN  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stops = pd.read_csv('data/stops.csv', parse_dates=[1], infer_datetime_format=True)\n",
    "stops.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the other hand, we could have received the Stops data in the following format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Num Incidents</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Call Date/Time</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015-01-26</th>\n",
       "      <td>46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-01-27</th>\n",
       "      <td>57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-01-28</th>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-04-28</th>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-04-29</th>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017-04-30</th>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>825 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                Num Incidents\n",
       "Call Date/Time               \n",
       "2015-01-26                 46\n",
       "2015-01-27                 57\n",
       "2015-01-28                 56\n",
       "...                       ...\n",
       "2017-04-28                 82\n",
       "2017-04-29                 86\n",
       "2017-04-30                 59\n",
       "\n",
       "[825 rows x 1 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(stops\n",
    " .groupby(stops['Call Date/Time'].dt.date)\n",
    " .size()\n",
    " .rename('Num Incidents')\n",
    " .to_frame()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, each record in the table corresponds to a single date instead of a single incident. We would describe this table as having a coarser granularity than the one above. It's important to know the granularity of your data because it determines what kind of analyses you can perform. Generally speaking, too fine of a granularity is better than too coarse; while we can use grouping and pivoting to change a fine granularity to a coarse one, we have few tools to go from coarse to fine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Granularity Checklist\n",
    "\n",
    "You should have answers to the following questions after looking at the granularity of your datasets. We will answer them for the Calls and Stops datasets.\n",
    "\n",
    "**What does a record represent?**\n",
    "\n",
    "In the Calls dataset, each record represents a single case of a police call. In the Stops dataset, each record represents a single incident of a police stop.\n",
    "\n",
    "**Do all records capture granularity at the same level? (Sometimes a table will contain summary rows.)**\n",
    "\n",
    "Yes, for both Calls and Stops datasets.\n",
    "\n",
    "**If the data were aggregated, how was the aggregation performed? Sampling and averaging are are common aggregations.**\n",
    "\n",
    "No aggregations were performed as far as we can tell for the datasets. We do keep in mind that in both datasets, the location is entered as a block location instead of a specific address.\n",
    "\n",
    "**What kinds of aggregations can we perform on the data?**\n",
    "\n",
    "For example, it's often useful to aggregate individual people to demographic groups or individual events to totals across time.\n",
    "\n",
    "In this case, we can aggregate across various granularities of date or time. For example, we can find the most common hour of day for incidents with aggregation. We might also be able to aggregate across event locations to find the regions of Berkeley with the most incidents."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
